Make sure your have rgdal, rgeos, raster, ggmap, leaflet and sp installed:
library('rgdal')
library('sp')
library('raster')
library('rgeos')
library('leaflet')
library('ggmap')
spPackage sp is the central package supporting spatial data analysis in R. sp defines a set of classes to represent spatial data. A class defines a particular data type. The data.frame is an example of a class. Any particular data.frame you create is an object (instance) of that class.
In fact, the sp package does not provide many functions to modify or analyse spatial data; but the classes it defines are used in >100 other R packages that provide specific functionalities. The classes sp defines are mostly starting with Spatial*. For vector data, the basic types are the SpatialPoints, SpatialLines, and SpatialPolygons. These classes only represent geometries (i.e. the spatial information). To link the spatial information to a data-table, classes are available with these names plus DataFrame, for example, SpatialPolygonsDataFrame and SpatialPointsDataFrame.
Consider the following vectors with coordinate information:
longitude <- c(4.4543, 5.02789, 4.94202, 4.49238, 4.49054, 4.54044, 5.95192,
4.49496, 4.4958, 5.68327, 4.49054, 4.49054, 4.4958, 4.48938)
latitude <- c(51.33847, 51.24824, 51.24325, 51.26218, 51.25701, 51.27518,
51.30803, 51.25803, 51.26106, 51.04185, 51.25701, 51.25701,
51.26106, 51.25955)
lonlat <- cbind(longitude, latitude)
Besides the numbers itself, the variable lonlat is not aware of any spatial context (it is just a matrix object).
pts <- SpatialPoints(lonlat)
class(pts)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
We converted the vector to an object of SpatialPoints, which contains, beside the matrix data, additional information:
showDefault(pts)
## An object of class "SpatialPoints"
## Slot "coords":
## longitude latitude
## [1,] 4.45430 51.33847
## [2,] 5.02789 51.24824
## [3,] 4.94202 51.24325
## [4,] 4.49238 51.26218
## [5,] 4.49054 51.25701
## [6,] 4.54044 51.27518
## [7,] 5.95192 51.30803
## [8,] 4.49496 51.25803
## [9,] 4.49580 51.26106
## [10,] 5.68327 51.04185
## [11,] 4.49054 51.25701
## [12,] 4.49054 51.25701
## [13,] 4.49580 51.26106
## [14,] 4.48938 51.25955
##
## Slot "bbox":
## min max
## longitude 4.45430 5.95192
## latitude 51.04185 51.33847
##
## Slot "proj4string":
## CRS arguments: NA
So we see that the object has the coordinates we supplied, but also a bbox. This is a ‘bounding box’, or the ‘spatial extent’ that was computed from the coordinates. There is also a proj4string. This stores the coordinate reference system (CRS). We did not provide the crs so it is unknown (NA). That is not good, so let’s recreate the object, and now provide a crs. We’ll come back to the projection definition later:
crs_wgs84 <- CRS("+init=epsg:4326")
pts <- SpatialPoints(lonlat, proj4string = crs_wgs84)
showDefault(pts)
## An object of class "SpatialPoints"
## Slot "coords":
## longitude latitude
## [1,] 4.45430 51.33847
## [2,] 5.02789 51.24824
## [3,] 4.94202 51.24325
## [4,] 4.49238 51.26218
## [5,] 4.49054 51.25701
## [6,] 4.54044 51.27518
## [7,] 5.95192 51.30803
## [8,] 4.49496 51.25803
## [9,] 4.49580 51.26106
## [10,] 5.68327 51.04185
## [11,] 4.49054 51.25701
## [12,] 4.49054 51.25701
## [13,] 4.49580 51.26106
## [14,] 4.48938 51.25955
##
## Slot "bbox":
## min max
## longitude 4.45430 5.95192
## latitude 51.04185 51.33847
##
## Slot "proj4string":
## CRS arguments:
## +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84
## +towgs84=0,0,0
Hence, the projection information is provided and our geometry is aware of the spatial context.
Remark: Assigning a CRS is like labeling something. You need to provide the label that corresponds to the item. Not to what you would like it to be. The latter requires a transformation.
The definition of the geometry as an object of the SpatialPoints class, provides a number of spatial functionalities that are not directly available to default matrices. As said in the introduction, the sp class mainly focuses on the classes (objects) and not on the functionalities. However, an essential functionality provided by sp itself (apart from over and aggregate), is the transformation of the coordinate reference system (CRS):
crs_lambert <- CRS("+init=epsg:31370")
pts_lambert <- spTransform(pts, crs_lambert)
showDefault(pts_lambert)
## An object of class "SpatialPoints"
## Slot "coords":
## longitude latitude
## [1,] 155961.2 225412.3
## [2,] 196022.8 215574.7
## [3,] 190031.5 214969.8
## [4,] 158629.2 216928.2
## [5,] 158501.7 216352.8
## [6,] 161980.6 218381.2
## [7,] 260392.8 223200.1
## [8,] 158810.1 216466.8
## [9,] 158868.1 216804.0
## [10,] 242186.7 193225.7
## [11,] 158501.7 216352.8
## [12,] 158501.7 216352.8
## [13,] 158868.1 216804.0
## [14,] 158420.3 216635.3
##
## Slot "bbox":
## min max
## longitude 155961.2 260392.8
## latitude 193225.7 225412.3
##
## Slot "proj4string":
## CRS arguments:
## +init=epsg:31370 +proj=lcc +lat_1=51.16666723333333
## +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666
## +x_0=150000.013 +y_0=5400088.438 +ellps=intl
## +towgs84=-106.8686,52.2978,-103.7239,0.3366,-0.457,1.8422,-1.2747
## +units=m +no_defs
Other (geometric) operations are provided by (a huge amount of) additional R Packages. The rgeos package is an important package to check for these functionalities. As an example: adding a buffer to our SpatialPoints object:
# gbuffer from the rgeos package (see later)
pts_lambert_buffer <- gBuffer(pts_lambert, byid = TRUE, width = 2000)
Hence, the result is again a Spatial* object, i.e. a SpatialPolygons object. A plot of the pts_lambert and the pts_lambert_buffered together looks as follows:
plot(pts_lambert_buffer, border = 'blue',
col = 'yellow', lwd = 3, axes = TRUE)
plot(pts_lambert, add = TRUE, col = 'red', las = 1)
For quick data exploration of such a small data set, an interactive visualization provides better insight. The leaflet package is well-documented and easy to use, certainly when the dplyr piping is familiar to use:
wgs_84 <- CRS("+init=epsg:4326")
leaflet() %>%
addTiles() %>% # provide a default openstreetmap background layer to the image
addCircles(data = pts) %>% # Add the points to the map
addPolygons(data = spTransform(pts_lambert_buffer, wgs_84)) # Add the buffer polygons to the map
EXERCISE
Check the documentation of the leaflet Package and adapt the following items to the leaflet map:
(1) The coordinates should be replaced by an icon marker instead of just a circle
(2) The buffers should not be filled and have a red color.
The output should look like:
The geometry classes provided by sp contain the spatial information, but do not add data to these points (or lines/polygons). The data attribute table is typically provided as a data.frame. For this reason, sp also provides classes that link the geometries with DataFrames: SpatialPointsDataFrame, SpatialLinesDataFrame and SpatialPolygonsDataFrame.
Let’s assume that for each of the coordinates defined above, we also have information about an identifier and a scientificName:
identifiers <- c('INBO:NBN:BFN00179000029DL', 'INBO:NBN:BFN001790000A584',
'INBO:NBN:BFN001790000AACI', 'INBO:NBN:BFN0017900009DOC',
'INBO:NBN:BFN001790000A4J0', 'INBO:NBN:BFN0017900009DO8',
'INBO:NBN:BFN001790000AACF', 'INBO:NBN:BFN001790000A4JO',
'INBO:NBN:BFN001790000A4K8', 'INBO:NBN:BFN001790000A51G',
'INBO:NBN:BFN001790000A4J3', 'INBO:NBN:BFN001790000A4J5',
'INBO:NBN:BFN001790000A4K6', 'INBO:NBN:BFN001790000A4L5')
scientific_names <- c('Lagarosiphon major', 'Lagarosiphon major',
'Muntiacus reevesii', 'Muntiacus reevesii',
'Muntiacus reevesii', 'Muntiacus reevesii',
'Muntiacus reevesii', 'Muntiacus reevesii',
'Muntiacus reevesii', 'Muntiacus reevesii',
'Muntiacus reevesii', 'Muntiacus reevesii',
'Muntiacus reevesii', 'Muntiacus reevesii')
recorded <- as.data.frame(cbind(identifiers, scientific_names))
head(recorded)
## identifiers scientific_names
## 1 INBO:NBN:BFN00179000029DL Lagarosiphon major
## 2 INBO:NBN:BFN001790000A584 Lagarosiphon major
## 3 INBO:NBN:BFN001790000AACI Muntiacus reevesii
## 4 INBO:NBN:BFN0017900009DOC Muntiacus reevesii
## 5 INBO:NBN:BFN001790000A4J0 Muntiacus reevesii
## 6 INBO:NBN:BFN0017900009DO8 Muntiacus reevesii
Hence, the SpatialPointsDataFrame combines the spatial information (pts_lambert) with the recorded data (recorded):
pts_recorded <- SpatialPointsDataFrame(pts_lambert, data = recorded)
pts_recorded
## class : SpatialPointsDataFrame
## features : 14
## extent : 155961.2, 260392.8, 193225.7, 225412.3 (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:31370 +proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.8686,52.2978,-103.7239,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs
## variables : 2
## names : identifiers, scientific_names
## min values : INBO:NBN:BFN00179000029DL, Lagarosiphon major
## max values : INBO:NBN:BFN001790000AACI, Muntiacus reevesii
For the moment, the take home message is that Spatial***DataFrame bring together the attribute data (data.frame), the coordinates (geometry) and the coordinate reference system (crs):
head(pts_recorded@data)
## identifiers scientific_names
## 1 INBO:NBN:BFN00179000029DL Lagarosiphon major
## 2 INBO:NBN:BFN001790000A584 Lagarosiphon major
## 3 INBO:NBN:BFN001790000AACI Muntiacus reevesii
## 4 INBO:NBN:BFN0017900009DOC Muntiacus reevesii
## 5 INBO:NBN:BFN001790000A4J0 Muntiacus reevesii
## 6 INBO:NBN:BFN0017900009DO8 Muntiacus reevesii
head(pts_recorded@coords)
## longitude latitude
## [1,] 155961.2 225412.3
## [2,] 196022.8 215574.7
## [3,] 190031.5 214969.8
## [4,] 158629.2 216928.2
## [5,] 158501.7 216352.8
## [6,] 161980.6 218381.2
pts_recorded@proj4string
## CRS arguments:
## +init=epsg:31370 +proj=lcc +lat_1=51.16666723333333
## +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666
## +x_0=150000.013 +y_0=5400088.438 +ellps=intl
## +towgs84=-106.8686,52.2978,-103.7239,0.3366,-0.457,1.8422,-1.2747
## +units=m +no_defs
In the previous section, we already used the CRS class definition to specify the coordinate reference system. As many other systems, R uses the PROJ.4 notation to define the CRS. The number of parameters depends on the projection. However, the easiest way is (mostly) using the EPSG code, for example:
But other options are available as well:
CRS("+proj=utm +zone=32")
## CRS arguments: +proj=utm +zone=32 +ellps=WGS84
In general, the definition of the CRS (input argument) should ba a string setup up by one or more +<arg>=<value> combinations, each of them separated by a blank value: +<arg1>=<value1> +<arg2>=<value2>. The CRS-creation returns an object (with a variable name you can assign):
wgs_84 <- CRS("+init=epsg:4326")
wgs_lambert <- CRS("+init=epsg:31370")
wgs_84
## CRS arguments:
## +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84
## +towgs84=0,0,0
wgs_lambert
## CRS arguments:
## +init=epsg:31370 +proj=lcc +lat_1=51.16666723333333
## +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666
## +x_0=150000.013 +y_0=5400088.438 +ellps=intl
## +towgs84=-106.8686,52.2978,-103.7239,0.3366,-0.457,1.8422,-1.2747
## +units=m +no_defs
which can be used in other functions/methods or used to support the transformation of the coordinate reference system of spatial aware objects. Hence, transformation won’t work on a default vector or matrix object:
But after transformation of the single coordinate to a SpatialPoints object, the spTransform function can be used:
single_coordinate <- SpatialPoints(matrix(c(98710.32800000161,
162573.7030000016),
ncol = 2),
proj4string = wgs_lambert)
spTransform(single_coordinate, wgs_84)
## class : SpatialPoints
## features : 1
## extent : 3.641625, 3.641625, 50.7714, 50.7714 (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
The application is similar to other spatial objects, such as a SpatialPointsDataFrame:
pts_recorded_wgs84 <- spTransform(pts_recorded, wgs_84)
pts_recorded_wgs84
## class : SpatialPointsDataFrame
## features : 14
## extent : 4.4543, 5.95192, 51.04185, 51.33847 (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 2
## names : identifiers, scientific_names
## min values : INBO:NBN:BFN00179000029DL, Lagarosiphon major
## max values : INBO:NBN:BFN001790000AACI, Muntiacus reevesii
EXERCISE
As your regularly encounter point coordinates that need conversion to another projection system, you decide to make a functions for future use:
For any data.frame(!) with coordinate information in a known coordinate reference system, update the columns with the x/y information to another coordinate system. The function returns the updated data.frame.
The function will probably look more or less like this:
reproject_points <- function(df, col_long, col_lat,
project_input, project_output){
# ...
return(df)
Following test should be able to work with the reproject_points function:
pts_ex2_test <- as.data.frame(pts_recorded)
ex2_test <- reproject_points(pts_ex2_test, "longitude", "latitude",
CRS("+init=epsg:31370"),
CRS("+init=epsg:4326"))
head(ex2_test)
## identifiers scientific_names longitude latitude
## 1 INBO:NBN:BFN00179000029DL Lagarosiphon major 4.45430 51.33847
## 2 INBO:NBN:BFN001790000A584 Lagarosiphon major 5.02789 51.24824
## 3 INBO:NBN:BFN001790000AACI Muntiacus reevesii 4.94202 51.24325
## 4 INBO:NBN:BFN0017900009DOC Muntiacus reevesii 4.49238 51.26218
## 5 INBO:NBN:BFN001790000A4J0 Muntiacus reevesii 4.49054 51.25701
## 6 INBO:NBN:BFN0017900009DO8 Muntiacus reevesii 4.54044 51.27518
Remark: This function is part of an inbo-rutil script, to guess a given projection of a given data set, when no CRS information is provided. The guess_projection function creates a leaflet plot with the coordinates in a variety of predefined coordinate reference systems.
rgdalMostly, the starting point of a data analysis involving spatial data is a data set in any kind of GIS format. The functionality to read vector data is provide by the readOGR function or the rgdal package:
deelbekkens <- readOGR("../data/deelbekkens/Deelbekken.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "../data/deelbekkens/Deelbekken.shp", layer: "Deelbekken"
## with 102 features
## It has 8 fields
## Integer64 fields read as strings: UIDN OIDN
deelbekkens
## class : SpatialPolygonsDataFrame
## features : 102
## extent : 22041.22, 258878.5, 153054.6, 244027.1 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=lcc +lat_1=49.8333339 +lat_2=51.16666723333333 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.01256 +y_0=5400088.4378 +ellps=intl +units=m +no_defs
## variables : 8
## names : UIDN, OIDN, DEELBEKKEN, BEKNR, BEKNAAM, STRMGEB, OPPERVL, LENGTE
## min values : 1020, 1, 01-01, 1, Bekken Brugse polders, Brugse Polders, 100472147, 105596.74
## max values : 994, 99, 11-11, 9, Netebekken, Schelde, 98413503, 99220.36
plot(deelbekkens)
Besides shapefiles, other vector data formats can be read as well. Certainly geojson is a increasingly used data format for feature (vector) data sets:
eu_10grid <- readOGR("../data/EUgrid10.geojson")
## OGR data source with driver: GeoJSON
## Source: "../data/EUgrid10.geojson", layer: "OGRGeoJSON"
## with 377 features
## It has 3 fields
eu_10grid
## class : SpatialPolygonsDataFrame
## features : 377
## extent : 2.385175, 6.480442, 49.48023, 51.54315 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 3
## names : CellCode, EofOrigin, NofOrigin
## min values : 10kmE379N312, 3790000, 2940000
## max values : 10kmE406N304, 4060000, 3160000
Actually, the set of data formats you can interact with, does not depend on R, but is dependent on your GDAL installation. To check if the installation supports a specific vector data format, you can get an overview of them by the ogrDrivers() command:
head(ogrDrivers()) # Just the first 6 records are shown here
## name long_name write copy isVector
## 1 AeronavFAA Aeronav FAA FALSE FALSE TRUE
## 2 AmigoCloud AmigoCloud TRUE FALSE TRUE
## 3 ARCGEN Arc/Info Generate FALSE FALSE TRUE
## 4 AVCBin Arc/Info Binary Coverage FALSE FALSE TRUE
## 5 AVCE00 Arc/Info E00 (ASCII) Coverage FALSE FALSE TRUE
## 6 BNA Atlas BNA TRUE FALSE TRUE
For data sets with multiple layers, the reading of a specific Layer is supported as well. For example, when the driver to read ESRI FileGeoDatabases is present, following commands will be useful as well:
# Information about the data (without actually reading the data itself)
ogrInfo(dsn = "name_filegeodatabase.gdb")
# List the names of the layers in the dataset
ogrListLayers(dsn = "name_filegeodatabase.gdb")
# Reading a specific layer
readOGR(dsn = "name_filegeodatabase.gdb", layer = "layer_of_interest"
Vector data represented as Spatial***DataFrame can be sub selected as a DataFrame. For example, selecting only the deelbekkens that are in the Netebekken (Boolean indexing):
nete <- deelbekkens[deelbekkens$BEKNAAM == "Netebekken", ]
plot(nete)
The result of the selection is again a SpatialPolygonsDataFrame. When only interested in the data itself, the selection of the attribute data alone can be done by converting the data to a DataFrame.
head(as.data.frame(nete))
## UIDN OIDN DEELBEKKEN BEKNR BEKNAAM STRMGEB OPPERVL LENGTE
## 10 982 82 10-03 10 Netebekken Schelde 122717152 59209.30
## 12 1020 88 10-09 10 Netebekken Schelde 122716486 64826.31
## 14 1041 86 10-07 10 Netebekken Schelde 119654760 66432.43
## 41 1169 87 10-08 10 Netebekken Schelde 92940654 49095.42
## 69 1227 80 10-01 10 Netebekken Schelde 132679768 58932.83
## 70 1228 89 10-10 10 Netebekken Schelde 174241356 90282.40
Hence, the geometry and projection data is no longer contained, only the data itself.
Remark: The sp data types do NOT support the dplyr package. If you want to use the %>% operator and the verbs such as filter, mutate,… as used in the dplyr package, you could check and install the spdplyr package. Another option is to start using the sf package, the newest package setup with the tidyverse environment in mind. This introduction and the documentation of of sf is good to start.
sp overlay and aggregationsThe sp package provides mostly the data types. However, the spTransform function is an essential function provided by the package itself. Two other functions of the sp package are worth mentioning: * disaggregate: groups of Line or Polygon (multi) are disaggregated to one Line/Polygon for each feature. When applied to the deelbekkens data set:
length(deelbekkens)
## [1] 102
deelbekkens_disaggr <- disaggregate(deelbekkens)
length(deelbekkens_disaggr)
## [1] 150
The number of features increases, as each polygon in the data set is now defined as an individual feature of the data set.
over(x, y) is used for spatial JOINsids <- over(spTransform(eu_10grid, wgs_lambert), pts_recorded)
ids <- na.omit(ids)
ids
## identifiers scientific_names
## 168 INBO:NBN:BFN0017900009DOC Muntiacus reevesii
## 169 INBO:NBN:BFN00179000029DL Lagarosiphon major
## 186 INBO:NBN:BFN0017900009DO8 Muntiacus reevesii
## 226 INBO:NBN:BFN001790000AACI Muntiacus reevesii
## 247 INBO:NBN:BFN001790000A584 Lagarosiphon major
## 331 INBO:NBN:BFN001790000A51G Muntiacus reevesii
The result provides the indices of the SpatialPolygonDataFrame together with the SpatialPointDataFrame data entries.
Remark Lines, and Polygon-Polygon overlays require rgeos
Check the vignette (vignette('over') in console) to get more details about all the different spatial JOIN options.
rgeos geometry operationsThe rgeos package provide a number of geometric operations, such as gArea(), gLength(), gDistance(), gBuffer(),… (single geometry) or gIntersection(), gUnion(), gContains,… (combining geometries).
As most commands of rgeos start with g***, use the power of the TAB-button to explore the available functions of the rgeos package: rgeos::g + TAB
These functions operate both on Spatial* objects (see above) as well as Spatial***DataFrame objects. Most of these functions speak for themselves (or read the manual). An important argument of these functions is byid. The byid argument determines if the function should be applied across sub geometries (TRUE) or the entire object (FALSE). Default is FALSE. For example, for the area calculation:
gArea(nete) # default is FALSE -> entire object
## [1] 1675885320
versus
gArea(nete, byid = TRUE)
## 10 12 14 41 69 70 80
## 122717152 122716487 119654760 92940654 132679768 174241355 87032046
## 89 90 91 92 93 101
## 128287765 69738882 93940592 145759867 226004155 160171837
Comparison of geometries involves two geometries:
vertical <- eu_10grid[grep('E400', eu_10grid$CellCode), ]
horizontal <- eu_10grid[grep('N311', eu_10grid$CellCode), ]
comparison <- gContains(vertical, horizontal,
byid = TRUE, checkValidity = TRUE,
returnDense = TRUE) # Test also with gIntersects
true_ids <- as.data.frame(which(comparison, arr.ind = TRUE)) # Find the TRUE elements
true_ids
## row col
## 310 21 18
# adjusting the colors transparency
blue <- adjustcolor("blue", alpha.f = 0.3)
red <- adjustcolor("darkred", alpha.f = 0.3)
# plot the cells selected by the geometry comparison:
plot(vertical)
plot(vertical[true_ids$col,], add = TRUE, col = blue)
plot(horizontal, add = TRUE)
plot(horizontal[true_ids$row,], add = TRUE, col = red)
EXERCISE
Make a leaflet plot (with open street map background) of the nete sub catchments, together with the Centroid for each of the sub geometries.
Tip 1: Looking for a command that matches the action centroid in R, is done by using ??centroid in the Console.
Tip 2: Remember that leaflet always expects WGS84 as CRS to make a plot(!)
The output should look like:
ggplotWhen working with DataFrame data, the application of ggplot2 is typical. Hence, it would be convenient to use ggplot2 (and the syntax) to plot Spatial***DataFrame as well. As ggplot2 only works on data.frame objects, a translation need to be made between the Spatial***DataFrame and a default data.frame.
For SpatialPointsDataFrame this is not really an issue, as the longitude and latitude columns provide the geometry information and can be used as such in ggplot by directly converting the data to a data.fram
pts_recorded_df <- as.data.frame(pts_recorded)
pts_recorded_df
## identifiers scientific_names longitude latitude
## 1 INBO:NBN:BFN00179000029DL Lagarosiphon major 155961.2 225412.3
## 2 INBO:NBN:BFN001790000A584 Lagarosiphon major 196022.8 215574.7
## 3 INBO:NBN:BFN001790000AACI Muntiacus reevesii 190031.5 214969.8
## 4 INBO:NBN:BFN0017900009DOC Muntiacus reevesii 158629.2 216928.2
## 5 INBO:NBN:BFN001790000A4J0 Muntiacus reevesii 158501.7 216352.8
## 6 INBO:NBN:BFN0017900009DO8 Muntiacus reevesii 161980.6 218381.2
## 7 INBO:NBN:BFN001790000AACF Muntiacus reevesii 260392.8 223200.1
## 8 INBO:NBN:BFN001790000A4JO Muntiacus reevesii 158810.1 216466.8
## 9 INBO:NBN:BFN001790000A4K8 Muntiacus reevesii 158868.1 216804.0
## 10 INBO:NBN:BFN001790000A51G Muntiacus reevesii 242186.7 193225.7
## 11 INBO:NBN:BFN001790000A4J3 Muntiacus reevesii 158501.7 216352.8
## 12 INBO:NBN:BFN001790000A4J5 Muntiacus reevesii 158501.7 216352.8
## 13 INBO:NBN:BFN001790000A4K6 Muntiacus reevesii 158868.1 216804.0
## 14 INBO:NBN:BFN001790000A4L5 Muntiacus reevesii 158420.3 216635.3
ggplot(pts_recorded_df, aes(x = longitude, y = latitude)) +
geom_point() +
geom_text(size = 4, aes(label = scientific_names),
check_overlap = TRUE, hjust = -0.1) +
xlim(150000, 300000)
## Warning in plyr::split_indices(scale_id, n): '.Random.seed' is not an
## integer vector but of type 'NULL', so ignored
For Lines and Polygons, the translation requires an entire interpretation of the configuration towards a data.frame. This functionality is provided by fortify, which translates the spatial information:
nete_df <- fortify(nete)
head(nete_df)
## long lat order hole piece id group
## 1 188940.7 198339.7 1 FALSE 1 10 10.1
## 2 189035.4 198326.5 2 FALSE 1 10 10.1
## 3 189248.9 198329.8 3 FALSE 1 10 10.1
## 4 189469.3 198354.7 4 FALSE 1 10 10.1
## 5 189726.5 198403.7 5 FALSE 1 10 10.1
## 6 189889.0 198438.6 6 FALSE 1 10 10.1
class(nete_df)
## [1] "data.frame"
The individual columns of the resulting data.frame describe the following elements:
long and lat: x and y coordinatesorder: sequence of the polygones or lineshole: identifies whether the feature the coordinates are part of define a holepiece: collects all the points that make a single feature (polygon/line), i.e. 1 attributes table line • group: defines the subpolygons or sublines (in case of multipolygon/multiline features) • id: row identifiers of the attribute table (data)Still, the added value of using ggplot2 is to use the attribute data to define colors,… As the attribute data table is available and we have an identifier id, we can JOIN the attribute data to the spatial information after retrieving the attribute data with the as.data.frame function:
nete_df <- merge(nete_df, as.data.frame(nete))
head(nete_df)
## long lat order hole piece id group UIDN OIDN DEELBEKKEN BEKNR
## 1 188940.7 198339.7 1 FALSE 1 10 10.1 982 82 10-03 10
## 2 189035.4 198326.5 2 FALSE 1 10 10.1 982 82 10-03 10
## 3 189248.9 198329.8 3 FALSE 1 10 10.1 982 82 10-03 10
## 4 189469.3 198354.7 4 FALSE 1 10 10.1 982 82 10-03 10
## 5 189726.5 198403.7 5 FALSE 1 10 10.1 982 82 10-03 10
## 6 189889.0 198438.6 6 FALSE 1 10 10.1 982 82 10-03 10
## BEKNAAM STRMGEB OPPERVL LENGTE
## 1 Netebekken Schelde 122717152 59209.3
## 2 Netebekken Schelde 122717152 59209.3
## 3 Netebekken Schelde 122717152 59209.3
## 4 Netebekken Schelde 122717152 59209.3
## 5 Netebekken Schelde 122717152 59209.3
## 6 Netebekken Schelde 122717152 59209.3
REMEMBER
In order to plot SpatialLinesDataFrame or SpatialPolygonsDataFrame data, the spatial data (with fortify()) need to be merged with the attribute data (with as.data.frame) to have a ggplot2 compatible data.frame!
This seems to be a useful small tooling for a custom R function…
EXERCISE
Write a function called prepare_spatial_for_ggplot that takes a SpatialLinesDataFrame or SpatialPolygonsDataFrame as input and execute the necessary steps (described above: fortify/as.data.frame/merge) to enable the plotting with ggplot2. The output of the function is a data.frame.
Following test should be able to work with the prepare_spatial_for_ggplot function:
ex4_test <- prepare_spatial_for_ggplot(nete)
head(ex4_test)
## long lat order hole piece id group UIDN OIDN DEELBEKKEN BEKNR
## 1 188940.7 198339.7 1 FALSE 1 10 10.1 982 82 10-03 10
## 2 189035.4 198326.5 2 FALSE 1 10 10.1 982 82 10-03 10
## 3 189248.9 198329.8 3 FALSE 1 10 10.1 982 82 10-03 10
## 4 189469.3 198354.7 4 FALSE 1 10 10.1 982 82 10-03 10
## 5 189726.5 198403.7 5 FALSE 1 10 10.1 982 82 10-03 10
## 6 189889.0 198438.6 6 FALSE 1 10 10.1 982 82 10-03 10
## BEKNAAM STRMGEB OPPERVL LENGTE
## 1 Netebekken Schelde 122717152 59209.3
## 2 Netebekken Schelde 122717152 59209.3
## 3 Netebekken Schelde 122717152 59209.3
## 4 Netebekken Schelde 122717152 59209.3
## 5 Netebekken Schelde 122717152 59209.3
## 6 Netebekken Schelde 122717152 59209.3
For plotting Lines, use the ggplot function geom_path() (and NOT geom_line()). For polygons, you can choose: * geom_path(): only plot the borders * geom_polygon(): usa a fill color for the polygons
ggplot(nete_df, aes(x = long, y = lat)) +
geom_polygon(aes(group = group, fill = id)) +
coord_fixed()
Remark: Adding group = group (grouping according to the by fortify derived groups) is required to plot the individual polygons separately(!)
ggmapThe ggmap package is the spatial plotting extension to ggplot2. It provides the ability to integrate the graphs of ggplot2 with spatial backgrounds coming from online resources such as Google maps, stamen maps and OpenStreetMap. The most convenient way to setup these background maps is to use the ggmap::get_*** functions. For example, using the toner-background of stamen:
extent <- c(left = 2.54, bottom = 49.46, right = 6.4, top = 51.51)
map <- get_stamenmap(extent, zoom = 8, maptype = "toner-background")
belgium <- ggmap(map)
belgium
Just as with the OpenStreetMap integration of the leaflet package, these spatial background coming from online resources use WGS84 (decimal degrees). Make sure to convert the vector data to the WGS84 CRS:
nete_wgs84 <- spTransform(nete, wgs_84)
nete_df_wgs84 <- fortify(nete_wgs84)
nete_df_wgs84 <- merge(nete_df_wgs84, as.data.frame(nete_wgs84))
belgium +
geom_path(aes(x = long, y = lat, group = group),
data = nete_df_wgs84, colour = "red")
The bbox can be defined more specifically to improve the boundaries of the background image. Focusing on the Nete catchment:
extent <- c(left = 4.42, bottom = 50.9, right = 5.4, top = 51.4)
map <- get_stamenmap(bbox = extent, zoom = 10, maptype = "toner-hybrid")
ggmap(map) +
geom_path(aes(x = long, y = lat, group = group),
data = nete_df_wgs84, colour = "red")
Remark: each of these online providers have different maptype options. Check the help of these functions to see the options (e.g. ?get_stamenmap).
The usage of Google maps enables the option to provide a google maps search as center of the image
map <- get_googlemap(center = "Geel Belgium", scale = 2,
maptype = "terrain", zoom = 9)
ggmap(map) +
geom_polygon(aes(x = long, y = lat, group = group, fill = id, alpha = 0.5),
data = nete_df_wgs84, colour = "white")
The Github repository and the paper provide more introductionary material to get you started with ggmap. Another potentially useful plotting package is the tmap package.
EXERCISE
Pick any spatial vector data set (shapefile, geojson,…) currently on your computer:
read the file into R as a Spatial***DataFramelength > 10000)ggplot with the color of each spatial entity defined by a chosen data columns